Lenin Mookiah, Tennessee Tech
University, lmookiah42@students.tntech.edu PRIMARY
Prof. William (Bill) Eberle, Tennessee Tech University, WEberle@tntech.edu
Prof. Larry Holder, Washington State University, holder@wsu.edu
Student Team:YES
Graph Based Anomaly
Detection (GBAD),
developed by the Big Data and Knowledge Discovery Group at Tennessee Tech
University.
Approximately how many hours were spent
working on this submission in total?
80 hours
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES
Video:
https://www.youtube.com/watch?v=nQUhNUF0YQA
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC2.1 – Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?
GBAD Introduction:
The GBAD graph-based anomaly detection tool suite discovers structural anomalies in data represented as a graph. The Minimum Description Length (MDL) principle is used to identify the normative pattern that minimizes the number of bits needed to describe the input graph after being compressed by the normative pattern.In this experiment, this would be the most likely normal (daily) activity of a given employee.
The GBAD probability algorithm uses the MDL evaluation technique to discover the normative pattern in a graph, but instead of examining all instances for similarity, this approach examines all extensions to the normative substructure (pattern), looking for extensions with the lowest probability. In other words, the algorithm examines the probability of extensions to the normative pattern to determine if there is an instance that includes edges and vertices that are probabilistically less likely than other possible extensions.
Summary
Typical employee of GASTech visit "Guy's Gyros" for meals, and a
couple of times visits to coffee shops (ex: Brew've been served), Kalami
Kafenion and Hippokampos. They also additionally visit "Ouzeri Elian"
(bar) and sometimes grocery stores (ex: general grocer, kronos mart) and
occasionally they visit Gelatogalore.
Below are the normative patterns for 4 of the GASTech employees (Figure
MC2.1.1-MC2.1.4).
Figure MC2.1.1: Engineering - Drill Technician. Name: Tempestad Brand
Figure MC2.1.2: Engineering - Drill Site Manager. Name: Onda Marin
Figure MC2.1.3: Engineering - Engineer. Name: Balas Felix
Figure MC2.1.4: Security - Perimeter Control. Name: Osvaldo Hennie
MC2.2 – Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe:
a. What is the pattern or event you observe?
b. There is an event occurring in the path of "Rist Way" (near Chostus Hotel) and around "Spetson Park". These suspects particularly spend time passing through "niovis st" and "exadakitiou way" in "Rist Way" and at some streets (mentioned in c. below) around "Spetson Park".
c. Who is involved?
d. Below are the suspects' car assigned (Car
ID), Department - Designation, and Name.
Car ID: 11 -Engineering - Hydraulic Technician (Name: Cazar Gustav).
Car ID: 9 -Engineering - Drill Technician (Name: Calzas Axel).
Car ID: 3 -Engineering - Engineer (Name: Balas Felix).
Car ID: 16 -Security - Perimeter Control (Name: Vann Isia).
Car ID: 21 -Security - Perimeter Control (Name: Osvaldo Hennie).
Car ID: 26 -Engineering - Drill Site Manager (Name: Onda Marin).
Car ID: 14 - Engineering - Engineering Group Manager (Name: Dedos Lidelse).
Car ID: 34 -Security - Perimeter Control (Name: Vann Edvard).
Car ID: 33 -Engineering - Drill Technician (Name: Tempestad Brand).
Car ID: 24 -Security - Perimeter Control (Name: Mies Minke).
e. What locations are involved?
f. "niovis st", "exadakitiou way" , "n estos st", "n utmana st", "n ketallinias st", "n ithakis st", "n oddisseos st"
g. When does the pattern or event take place?
h. Pattern happens between the evening of 01/10 and 01/11. For each event that contributes to the pattern, the pattern (graph) is shown, and a table is shown below each event that gives the information of movements (events) for each suspect.
Event 1: Figure MC2.2.1:
Event 1: Security - Perimeter Control (Name: Osvaldo Hennie):
On 01/11 in the afternoon, spent around 6+ hours at "n utmana st 3600
3698" (near "Spetson Park") passing via "niovis st"
and "exadakitiou way".
Event 1: Table MC2.2.1: Car ID: 21
DateTime |
Location |
Comments |
1/10/14 17:30 |
n souliou st 1500 1522 |
|
1/11/14 3:25 |
niovis st 3100 3198 |
|
1/11/14 3:28 |
exadakitiou way |
|
1/11/14 3:31 |
n utmana st 3600 3698 |
night spent |
1/11/14 11:03 |
exadakitiou way |
Event 2: Figure MC2.2.2:
Event 2: Car ID: 16 - Security - Perimeter Control (Name: Vann Isia):
On 01/10 late night passing through "exadakitiou way", spent 3 hours
mid-night between "n utmana st 3700 3798".
Event 2: Table MC2.2.2: Car ID: 16
DateTime |
Location |
Comments |
1/10/14 23:05 |
exadakitiou way |
|
1/10/14 23:21 |
n utmana st 3700 3798 |
|
1/11/14 3:23 |
n utmana st 3700 3798 |
3-hours spent |
Event 3: Figure MC2.2.3:
Event 3: Car ID: 33 - Engineering - Drill Technician (Name: Tempestad Brand):
On 01/10 by passing through "exadakitiou way" and "niovis st
2700 2798" late night 4 hours spent at "n ketallinias st 4600
4650" (near "Spetson Park"). Midnight spent time between
"niovis st" and "exadakitiou way".
Event 3: Table MC2.2.3: Car ID: 33
DateTime |
Location |
Comments |
1/10/14 19:30 |
n ketallinias st 4600 4650 |
|
1/10/14 23:37 |
n ketallinias st 4600 4650 |
around 4-hours spent |
1/10/14 23:41 |
n estos st 3600 3698 |
|
1/10/14 23:43 |
exadakitiou way |
|
1/10/14 23:45 |
niovis st 2700 2798 |
|
1/10/14 23:46 |
n sirrakou st 601 2499 |
|
1/11/14 19:38 |
niovis st 2700 2798 |
next day available in same location (may late-night spent) |
Event 4: Figure MC2.2.4:
Event 4: Car ID: 34 - Security - Perimeter Control (Name: Vann Edvard):
Passing through "niovis st" and "exadakitiou way". On 01/10
spent 4 hours time between "exadakitiou way" and "n estos st
3600 3698" (near Spetson Park).
Event 4: Table MC2.2.4: Car ID: 34
DateTime |
Location |
Comments |
1/10/14 17:29 |
n tangno st 800 898 |
|
1/11/14 14:12 |
n kritis rd 2700 2898 |
|
1/11/14 14:13 |
niovis st 3100 3198 |
|
1/11/14 14:13 |
niovis st 3100 3198 |
|
1/11/14 14:15 |
exadakitiou way |
|
1/11/14 14:15 |
exadakitiou way |
|
1/11/14 18:11 |
n estos st 3600 3698 |
near Spetson Park ; 4-hours |
Event 5: Figure MC2.2.5:
Event 5: Car ID: 26 - Engineering - Drill Site Manager (Name: Onda Marin):
Passing through "niovis st" and "exadakitiou way", on 01/10
mid-night spent approximately 4 hours between "exadakitiou way" and
"n estos st 3600 3698" (near Spetson Park).
Event 5: Table MC2.2.5: Car ID: 26
DateTime |
Location |
Comments |
1/10/14 20:10 |
n ketallinias st 4600 4650 |
|
1/11/14 0:22 |
n ketallinias st 4600 4650 |
Mid-night around 4-hours spent |
Event 6: Figure MC2.2.6:
Event 6: Car ID: 11 - Engineering - Hydraulic Technician (Name: Cazar Gustav):
Spent around 5 hours at "n ketallinias st 4600 4650" (near Spetson
Park).
Event 6: Table MC2.2.6: Car ID: 11
DateTime |
Location |
Comments |
1/10/14 18:45 |
n ketallinias st 4600 4650 |
|
1/10/14 23:23 |
n ketallinias st 4600 4650 |
near mid-night spent around 4-hours |
Event 7: Figure MC2.2.7:
Event 7: Car ID: 24 -Security - Perimeter Control (Name: Mies Minke):
Nighttime spent passing through "niovis st" and "exadakitiou
way". Around 3 hours spent between "n ithakis st 3700 3848" and
"n oddisseos st 3600 3698".
Event 7: Table MC2.2.7: Car ID: 24
DateTime |
Location |
Comments |
1/10/14 11:18 |
n scarkeme st 2300 2598 |
|
1/11/14 12:55 |
niovis st 2700 2798 |
|
1/11/14 13:39 |
n ithakis st 3700 3848 |
near Spetson Park |
1/11/14 16:15 |
n oddisseos st 3600 3698 |
About 3 hours spent; near Spetson Park |
1/11/14 16:18 |
exadakitiou way |
|
1/11/14 16:20 |
niovis st 2900 2998 |
Event 8: Figure MC2.2.8:
Event 8: Car ID: 3 - Engineering - Engineer (Name: Balas Felix):
Spent mid-night 5 hours at "n ketallinias st 4600 4650" (near
"Spetson Park").
Event 8: Table MC2.2.8: Car ID: 3
DateTime |
Location |
Comments |
1/10/14 19:03 |
n ketallinias st 4600 4650 |
|
1/11/14 0:29 |
n ketallinias st 4600 4650 |
Mid-night spent |
1/11/14 00:29:28 |
n omirou st 4700 4798 |
Event 9: Figure MC2.2.9:
Event 9: Car ID: 9 - Engineering - Drill Technician (Name: Calzas Axel):
On 01/10 near mid-night spent 4 hour at "n ketallinias st 4600
4650"(near Spetson Park).
Event 9: Table MC2.2.9: Car ID: 9
DateTime |
Location |
Comments |
1/10/14 19:11 |
n ketallinias st 4600 4650 |
|
1/10/14 19:12 |
n ketallinias st 4600 4650 |
|
1/10/14 23:55 |
n ketallinias st 4600 4650 |
|
1/10/14 23:55 |
n ithakis st 4500 4598 |
|
1/11/14 19:34 |
n ithakis st 3700 3848 |
may be late-night stay |
Event 10: Figure MC2.2.10:
Event 10: Car ID: 14 - Engineering - Engineering Group Manager (Name: Dedos
Lidelse):
On day 01/10, late night 4 hour spent at "n ketallinias st" and
passing through "niovis st".
Event 10: Table MC2.2.10: Car ID: 14
DateTime |
Location |
Comments |
1/10/14 18:59 |
n ketallinias st 4600 4650 |
|
1/10/14 23:30 |
n ketallinias st 4600 4650 |
around 5-hours spent |
1/10/14 23:38 |
niovis st 2900 2998 |
|
1/10/14 23:38 |
niovis st 2700 2798 |
|
1/12/14 12:31 |
niovis st 2900 2998 |
0/11 data missing |
i. Why is this pattern or event significant?
j. Pattern is significant because at least 8 employees moving around locations of "Spetson Park" and "Chostus Hotel" which are away from office (or) regular eating place.
k. What is your level of confidence about this pattern or event? Why?
l. Very few employees visit these locations, which do not appear to be related to work, eating, or shopping. These suspicious patterns of movement are rare in the data.
MC2.3 – Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data. Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.
By running a simple script that calculates statistics on data such as distinct
number of days, number of GPS (car) movement recorded for each employee etc. By
analyzing these statistics, we find missing data for a few employees as
described below.
1) Badging Office (Herrero Kanon)
01/06 data missing (only one GPS location recorded). On 01/07, fewer GPS
recordings than usual. For this employee, only morning and evening data
recorded. Similarly for "Frente Vira", data of 01/15 start from
12:18, where one would assume that the employee should have been seen to at
least travel from home to work.
2) Perimeter Control (Osvaldo Hennie)
Since the most important suspicious event we look for is between 01/10 and
01/11, despite we have missing data for the day 01/12 for this employee, we
conclude the employee is suspicious.
3) Engineering Group Manager (Dedos Lidelse)
01/11 is weekend. 01/11 data is missing for this employee, which may be due to
employee not moved at all for the day or missing (imperfect) data as mentioned
in the questionnaire.
4) IT Group Manager (Bergen Linnea)
01/12 is weekend. The data is missing for this employee which may be due to
employee not moved at all for the day or missing (imperfect) data as mentioned
in the questionnaire.
5)
Outliers
For Vasco-Pais Willem, GBAD outputs "n utmana st", which is a
suspected location, but the employee visits this location regularly. Hence we
conclude the employee as non-suspect. Similarly for Strum Orhan, GBAD outputs
"n mikonou st" on 01/19 as anomaly, which is closer to "Spetson
Park", but this location, is not in intersection of our suspicious events
of between 01/10 and 01/11, hence concluded as non-suspect.
Figure MC2.3.1: Vasco-Pais Willem on 01/11.
Figure MC2.3.2: Strum Orhan on 01/19.